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Abstract: We consider the non-parametric maximum likelihood estimation 
in the class of Polya frequency functions of order two, viz. the densities with 
a concave logarithm. This is a subclass of unimodal densities and fairly rich 
in general. The NPMLE is shown to be the solution to a convex programming 
problem in the Euclidean space and an algorithm is devised similar to the iter- 
ative convex minorant algorithm by Jongbleod (1999). The estimator achieves 
Hellinger consistency when the true density is a PFF2 itself. 



1. Introduction 

The problem of estimating a unimodal density and its mode has attracted a wide 
interest in the literature, begin ning with the work of Barlow Prakasa Rao, 
0, Robertson ^ and Wegman [Iff, []]| and continuing through [§], 0, 0, 0, 
and 0, who can be consulted for further references. Asymptotic properties of the 
maximum likelihood estimators have been developed but may be messy and suffer 
from some inconsistency in the region near the mode. Kernel estimation can avoid 
the inconsistency, but must confront the choice of a bandwidth. Here we investigate 
a smaller, easier version of the problem, estimating a Polya frequency function 
of order two [hereafter PFF 2 ]. By PFF 2 , we mean a density / whose logarithm 
is concave over the support of /. Equivalently, a function / whose logarithm is 
concave in the sense of Rockafellar [lOj. Such functions are automatically unimodal. 
Moreover, an estimated PFF 2 supplies its own estimate of the mode. There is no 
need to estimate the mode seperately. 

The non-parametric maximum likelihood estimator [hereafter, NPMLE] for this 
problem is derived in Section [2] and shown to be Hellinger consistent in Section [3] 
Simulations are reported in SectionQJ Rufibach and Duembgen and Walther [lj| 
adopt a similar approach but obtain different results by different methods. 

2. The NPMLE 

Let T be the class of PFF 2 densities and suppose that a sample X\ , . . . , X n 

is available. The problem is to estimate / non-parametrically. Letting —00 < x\ < 

x 2 < • • • < x n < 00 denote the order statistics the log-likelihood function is 

n 

(2.1) *(ff) = 5>gk(*i)]. 
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This is to be maximized with respect to g £ T . Equivalently, this is to be maximized 
with respect to non-negative g for which log(<?) is concave and 

/oo 
g(x)dx = 1. 
-OO 

If g £ T, write 0, = log[p(ar i )]. Then \og\g(x)} > [(x - + (x % - x)9 i ^ 1 ]/ (xi - 

Xi-i) for Xi-i < x < Xi and, therefore, 

(2.3) / g{x)dx > [ — ](xi-x i - 1 ) 



for i = 2, ... ,n. It follows easily that i2.1\) is maximized when g £ T has support 
[xx,x n ] and \og(g) is Q piecewisc linear function with knots at x.\^ . . . -,x n . For if 
g £ T, let g° be a function for which log(g°) is piecewise linear, g°(xi) = g(xi), i = 
1,. ..,n, and <7°(x) = for x ^ [xi,x„]. Then, using (|2.3p . > g°(a:) for all x 
with equality for x £ {xi, . . . , x„}, and therefore, 



1 = g(x)dx > / g(x)dx > / g°(x)dx = / g°(x)dx. 

So, there is a c < 1 for which g* = g°/c G J* and £(#) < £(g*). 

Thus, finding the NPMLE may be reformulated as a maximization problem in 
M n . Let JC be the set of 6 = [9 U 6 2 , . . . , n ] £ ttt n for which 



for i = 2, . . . ,n — 1. The reformulated maximization problem is to maximize #i + 
■ • ■ + n among 9 £ /C subject to the constraint 



(2-4) E L _g. ](xi-x < _ 1 ) = l. 

Introducing a Lagrange multiplier, it is necessary to maximize 

(2.5) vw-E^- A E[ fl ._ fl . ](^-^-i) 



i=l i=2 



subject to (|2.4p . for appropriate A, for G /C. 
The partial derivatives of i/ 7 are 



7 :2 17 j i ,7 2 - Pi) 2 



^r = 1 " A[ ^ 1 ~(^^ ](a;2 - a;i) 



e e 2 e e 3 _ e 2 



-0 2 ) 2 



(x 3 - x 2 ), 



dtp(e) _ r e 6 i e B i - e^- 1 -, 

~w 1 _ (^-^-i)^ (Xj _ Xj - l} 

e 0j e® i+1 — e 6j 

- x l - t~ — T + m Ala ~ Xj 
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and 



Oft,, 1 A ^ -0 • Id -0 A2\( Xn X n-l)- 



At the maximizing 9, Vtp(6) 1 = 0, so that 

g#l g#2 g#l 

n = A[ "^^ + (^F ](a;2 - Xl) 
+ A M [g— ^— - ( ..-._ l)2 ] (»i " «i-0 



i=2 ^-<7j-i W -I7 j-i) 2 

gS„ g9„ _ gS n _i 

A [/j 7 77] 7 \a] ( x ™ — &n-l) 



There is some cancelation here, and 



3-1 



A H fl._fl. _ 



7., — {7i_l 
3 = 2 J J 1 

So, if (12. 4[) is to be satisfied, then 

(2.6) A = n. 
Now let 

w\ = 9i 
Wj = — — 

for j = 2, . . . ,n. Then, 

(2.7) w 2 > • • • > u„ 
and 

3 

(2.8) 6>j —oji+ ^2(xj - Xj-i)uj 



i=2 

for j = 1, . . . , n, where an empty sum is to be interpreted as 0. Let 
(2.9) 0(w) = 

and A:Ej = #j — Xj—i. Then 

n n 
3=1 i=2 

3=2 3 
n 

3=2 
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where 

, , e* - 1 
p(t) = — 

So, 

n n 

4>{ui) = nu>i + — i + l)Aa;,a;, — re ^""^ e® 1 ^ 1 p^AxjU^Axj. 

i=2 j=2 

Using (|2.8|) , it follows that at the maximizing 9 (or w) 

_ n _ n e 6 *- 1 p(AxjWj)A%j = 0. 

So, we are led to the problem of maximizing <fi(u>), subject to (|2.7p and d<f>{u>) / dui\ = 
0. Again using (|2.8[) , the latter condition may be written 

n ._ 

(2.10) e-" 1 = J2 e ^= 2 ^^piAxjtu^Axj. 

To solve this, we need a version of the iterative concave majorant algorithm, similar 
to that of Jongbloed [g]. We start with an initial value u>° = (cj°, . . . , for which 
(|2.7p and (|2.10p are satisfied. One such choice is to assume / is a normal density 
and estimate its mean and variance from the data. The corresponding oj° can be 
computed using a scaled piecewise linear version of log/. Let k = 0. 

The idea behind our algorithm is to replace the concave function 4> locally near 
uj k by a quadratic form of the type 

q(u, oJ k ) = i(« - w fe + r(w fe )- 1 V0(cj fe )) t r(w fc )(a; - uo k + r(cj fe )- 1 V<^(w fc )) 

where T is a diagonal matrix with entries d 2 cf>(u>) / dui 2 . and V<^ is the gradient vector. 
This maximization has a geometric solution given by the left hand slopes of the 
concave majorant of the data cloud: (J2\=i Yli^ii^i^i + &<])) where 

5I ' = i 0HU ^' 



i2.li) =_^( w )| t 



•'it 

This can be also characterized explicitly as, 



r+1 . + 

£J- = mm max 



2<j<ii<k<n rfr 
L^ih—j h 

Finally, to satisfy $ZJ$ , 



- 1 

< +i = - log e s:: 2 ^^^(a^^+^a^] . 

i=2 



To implement this, we need to compute the partial derivatives {d / dLOk)4>{oj) for 
= 1, . . . , n. Clearly, 

g#jj = Q 
9wi 



Polya frequency function 



213 



Table 1 

Angular distances from the center R and line-of-sight 
velocities V of stars in Fornax. 



Star 


Date 


R 


V 






arcmin 


km/sec 


Fl-1 


29 Nov. 1992 


14.5 


55.8 


Fl-2 


29 Nov. 1992 


20.8 


42.9 


F9-8025 


15 Doc. 2002 


42.3 


69.2 



and 

^ = (n - k + l)(xk - Xk-i) - n Axke ej - 1 p(AxjOJj)Axj 

UJk j=k+l 

-ne ek -ip'(Ax k uj k )Ax 2 k 
for k = 2, . . . , n; and the second derivatives are 



- n 



« j=k+l 

+ e e ^p"{Ax k uj k )Axl 

for k = 2, . . . , n. To achieve stability, we modify the algorithm using a line search 
method as follows. It is not certain that the new point uj will have a larger value of 
<j>. Therefore, we need to perform a binary search along the line segment joining u> k 
and ui to get a point uu k+1 such that 4>{u> k+1 ) > 4>(u> k ). Finally, we stop the iteration 
when two consecutive iterates have very close <fi values. 

Example. Walker et. al. [l3| have reported the line-of-sight velocities of 178 stars 
in the Fornax dwarf spheroidal galaxy. The nature of the data is reported in Table 
[TJ The full data set can be found in [13(. Figure [T] displays the estimated density of 
line-of-sight velocity, assuming that the later is a Polya frequency function. The 
sharp peak at the mode is, unfortunately, an artifact of the method. 



3. Consistency 

Let F be a distribution function with density / and suppose throughout that 

(3.1) - oo < / log(/)dF < oo. 

J Ml 

Let X\,Xi, ■ ■ ■ ^ lnd F; and let Ff be the empirical distribution function. 

F # (x) = #{k<n:X k <x} _ 

Further, let h denote the Hellinger distance between densities, 

(3.2) h 2 ( 9l ,g 2 )= [ {^g 1 -^g 2 fdx = 2[l- [ ^gidx}. 



The purpose of this section is to prove: If f is a PFF2 for which h3.1\) is satisfied 
and f n is the maximum likelihood estimator, then Ymin^,^ h 2 (f, /„) = w.p.l. 



244 



J. Kumar Pal, M. Woodroofe and M. Meyer 

Fornax data 



0.03 



0.025 




40 60 80 100 120 140 

line-of-sight-velocity 



Fig 1. Estimated density of velocities for 178 stars. 



Lemma 1. If f and g are densities and b > 0, then 

Jjog(^j)dF<e{b)-h 2 (f,g), 



where 



Proof. In this case, 



dF. 



! b + g 
b + f 



dF-l 



< 2 



:dF 



9 



/Hp+/ Jn\b + f 
<e(b) + 2\ f y/gjdx-l 
= e{b)-h\f,g). 



dF-l 



□ 
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Lemma 2. Let < b, c < oo. If g is unimodal and swp x g(x) < c, then 



2 15 



(3.3) |/ log(6 + fl )d(F#-F)| <2sup|if (x)-F(z)|log(l 
Proof. Integrating by parts, the left side of (|3.3[) is 

| f (F-F*)dlog(b + g)\, 

JR. 



which is at most the right side. 



□ 



Now let U be a class of unimodal densities; let l n denote the log-likelihood 
function, so that 



£ n (g) =f2log[g(X i )} =n f \og(g)dF* 



for g £ U; and let /„ be the MLE in U (assumed to exist). 
Theorem 3.1. If f € U and C n = sup^, f n (x) satisfies, 



(3.4) 
then 



logC„ = sup log (a;) = o[- — —] w.p.l, 

log(n) 



lim h(f,f n ) = w.p.l 

Proof. If / G U, then 

< t n (f n ) ~ £ n (f) = n\ f \og{f n )dF*-\ \og(f)dF* 

L JR 

So, if b > 0, then 



< / log(6 + f n )dF* - J log(f)dF* = I n + II n + III n , 



where 



and 



I n = / log(6 + f n )d(F# — F), 

JR. 

III n = [ log(b + f)dF- [ \og(f)dF* 

JR. JR. 



With C n as in (O, 



\I n \ <2sup|F#(x)-^(x)|log(l 



C n 



w.p.l 



as n — > oo, by Lemma[2]and the consistency of Ff. Also, 

n n <e(b)~h 2 (f,f n ), 
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by Lemma [T] and 

Urn UI n = / [log(6 + /) - log(f)]dF, 



by the Strong Law of Large Numbers. So, w.p.l, 



lim sup/* 2 (/,/„) < f pog(6 + /)-]og(/)]dF + e(6), 

which approaches zero as b — ► 0. □ 
Lemma 3. Let g be a PFF2 density. I/O < g(a) < g(b), then 



L .9( 

Proof. There is no loss of generality in supposing that a < b. Let h — log(g). Then 

h(x) > h a + (^— -)[h b -h a ], 
b — a 

where h a and h b were written for h(a) and h(b). So, 

f b x - a 
g(x)dx> / exp{h, a + (-r ) [h b - h a ] } dx 

h b -h a 
/, , ft" .9a 

= 0- ah 



log(ffb/g a ) ' 

Of course, g(x)dx < 1, and g(x) > g a for a < x < b. So, (/ a < 1/(6 — a) and 

9 b < 9a + (^-) log(^) < (^-){1 + bg } , 

as asserted. □ 
Lemma 4. If a,b,x > and x < a log(x) + b, then x < 2a log(2a) + 2b. 
Proof. If x < a log(ir) + 6, then 

x < alog(2a + x) + b < a[log(2a) H 1 + b = -x + alog(2a) + 6, 

2a 2 

from which the lemma follows immediately. □ 

Now let f n be the MLE in the class of PFF2 densities. Then /„ attains its 
maximum at an order statistic, say x m . If m < n/2, let g = [3?i/4j + 1; and if 
m > n/2, let g = [n/4\. 

Theorem 3.2. If f is a PFF2 density, then 

(3.5) C := sup/ n (x m ) < 00 w.p.l. 

n>l 
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Proof. Let K n = g or n — q, accordingly as to > n/2 or < n/2. Then K n > re/4, 
and 



(3.6) 



\Xm %q\ 

by Lemma [3] If / is a PFF 2 , then 

in(f) < in(fn) < K n l g[f n (x q )] + (n - K n ) \og[f n {x m )]. 

So, 

log [k^-] < ±- log[/„(x m )] - ±tn{f) 



fn(Xq) K n 



< 4 



log[/ n (x m )] - -£ n (f) 



- r f 1 + 41og[/„(x m )] - -£„(/)) 



: log [/n(a:m)] + B« , 



rn q \ 



(3.7) 

Combining (J3J)]) and ([3J| 

fn {Xni ) ^ 

where 
and 
So, 

/»W <2A„log(2A n ) + 2B n , 

by Lemma 21 Here sup n A n < oo and sup„ B n < oo, by (|3.1[) and the choices of to 
and g, establishing the theorem. □ 

From the choice of to = m n , (|3 . 5[) may be written C — sup n sup.,, f n (x). 
Corollary 1. IflA is the class of PFF2 densities and f G U, then limn^oo h 2 (f, 

fn) = W.p.l. 



B,, 



\Xq Xm\ 



1 - -US) 



4. Simulations 

To assess the speed of convergence, we conducted simulation study for different well- 
known members of the PFF2 class. The densities we sampled from are: Gaussian, 
Double exponential, Gamma (with shape parameter 3 and scale parameter 1), Beta 
(with shape parameters 3 and 2) and Weibull (with parameters 3 and 1). Table |4] 
gives us the summary for approximate Hellinger distances between the estimate and 
the true underlying density with increasing sample sizes 50, 100, 200, 500, 1000. We 
also graphically show how the estimators look like in some of this examples. Figure[5] 
shows the corresponding plots for Normal, Double Exponential and Gamma for 
sample sizes 50, 100, 200 respectively. 
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Table 2 

The Monte Carlo estimates of finite- sample Hellinger distances, for sample 
size n = 50, 100, 200, 500, 1000 and number of replications M = 500 
for five different log-concave densities. The upper figure is the 
estimate and the lower is one standard deviation. 



Sample 


Normal 


Double 


Gamma 


Beta 


Weibull 


size 


(0,1) 


Exponential 


(3,1) 


(3,2) 


(3,1) 


50 


0.1658 


0.1548 


0.1518 


0.1823 


0.0854 




± 0.0531 


± 0.0466 


± 0.0447 


± 0.0582 


± 0.0263 


100 


0.0680 


0.1146 


0.0934 


0.0988 


0.0947 




± 0.0218 


± 0.0426 


± 0.0305 


± 0.0363 


± 0.0319 


200 


0.0624 


0.0956 


0.0532 


0.1166 


0.0312 




± 0.0167 


± 0.0252 


± 0.0164 


± 0.0360 


± 0.0067 


500 


0.0290 


0.0626 


0.0088 


0.0766 


0.0347 




± 0.0104 


± 0.0203 


± 0.0029 


± 0.0214 


± 0.0115 


1000 


0.0028 


0.0139 


0.0019 


0.0170 


0.0274 




± 0.0010 


± 0.0031 


± 0.0007 


± 0.0031 


± 0.0089 




Fig 2. The estimated log-concave density for different simulation examples. The sample sizes are 
50,100 and 200 respectively for first, second and third columns. The three rows correspond to 
simulations from a Normal(0,l), a double- exponential and a Gamma(3,2) density. The bold one 
corresponds to the true density and the dotted one is the estimator. 
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